I 


AD-A144  593  HIERARCHICAL  MULTISENSOR  IMAGE  UNDERSTANDINGS)  i/ 

HONEVWELL  SVSTEMS  AND  RESEARCH  CENTER  MINNEAPOLIS  MN 
R  K  AGGARWRL  JUL  84  AFOSR-TR-84-0639  F49620-83-C-0134 
UNCLASSIFIED  F/Q  9/4  NL 


k; 


■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

g 

g 

MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BUREAU  OF  STANDARDS-I963-A 


flflfc  FILE  COPY  AD-A144  593 


AFOSR-TR. 


HIERARCHICAL  MULTISENSOR 
IMAGE  UNDERSTANDING 


TECHNICAL  REPORT 
AFOSR  F49620-83-C-0 134 
ANNUAL  REPORT  FOR  PERIOD 
OCTOBER  1983  -  SEPTEMBER  1984 

JULY  1984 


Honeywell 

SYSTEMS  &  RESEARCH  CENTER 

IEOH  RIOGWAY  PARKWAY 
MINNEAPOLIS.  MINNESOTA  S54I3 


AIR  FORCE  OFFICE  OF  SCIENTIFIC  RESEARCH, 
AIR  FORCE  SYSTEMS  COMMAND  O 
UNITED  STATES  AIR  FORCE  €f  M 


84  08  17  066 


UNCLASSIFIED 


:urity  classification  of  this  pac,- 


REPORT  DOCUMENTATION  PAGE  ^ 

’  I.  REPORT  SECURITY  CLASSIFICATION 

lb.  RESTRICTIVE  MARKINGS 

UNCLASSIFIED 

1.  SECURITY  CLASSIFICATION  AUTHORITY 

3.  DISTRIBUTION/AVAILABILITY  OF  REPORT 

□ _ _ _ 

Approved  for  public  release;  distribution 

-  b.  OECLASSIFICATION  DOWNGRADING  SCHEDULE 

unlimited. 

PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

Afosr  tr.  n  - ■  ore 

NAME  OF  PERFORMING  ORGANIZATION 

Honeywell,  Inc. 

5  b.  OFFICE  SYMBOL 
(If  applicable ) 

7a.  NAME  OF  MONITORING  ORGANIZATION 

Air  Force  Office  of  Scientific  Research 

»c.  ADDRESS  (City.  Stale  and  ZIP  Code) 

Honeywell  S3,,stems  and  Research 

2600  Ridgway  Parkway,  Minneapolis  MN  55413 

1 _ _ _ _ 

7b.  ADDRESS  (City.  Slate  and  ZIP  Code) 

Directorate  of  Mathematical  &  Information 
Sciences,  Bolling  AFB  DC  20332 

3a.  NAME  OF  FUNDING/SPONSORING 
ORGANIZATION 

8b.  OFFICE  SYMBOL 
(If  applicable) 

9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

AF0SR 

NM 

E49620-83— C— 0134 

3c.  ADDRESS  tCity.  State  and  ZIP  Code I 

Bolling  AFB  DC  20332 


10.  SOURCE  OF  FUNDING  NOS. 


PROGRAM 
ELEMENT  NO. 

61102F 


PROJECT 

NO. 


WORK  UNIT 
NO. 


1.  TITLE  f Inciude  Security  Classification) 

HIERARCHICAL  MULTISENSOR  IMAGE  UNDERSTANDING 


2.  PERSONAL  AUTHORIS) 

Raj  K.  Aggarwal 


13a  TYPE  OF  REPORT 

i  Interim 


13b.  TIME  COVERED 

from  1/7/83  to; 


14.  OATE  OF  REPORT  (Ye.  Mo..  Day) 

JUL  84 


15.  PAGE  COUNT 

54 


COSATI  COOES 


FIELD  GROUP 


18.  SUBJECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

Image  processing;  image  understanding;  artificial 
intelligence; ^scene  analysis^  attributed  graphs. 


'9.  ABSTRACT  ( Continue  on  reverse  if  necessary  and  identify  by  block  number / 

This  report  describes  the  research  results  on  Honeywell's  Hierarchical  Multisensor  Image' 
Understanding  program.  Honeywell  is  developing  a  unified  framework  for  the  different 
hierarchical  levels  of  image  processing  such  as  segmentation,  detection,  classification, 
and  identification  of  outdoor  scenes  and  across  different  sensor  modalities  such  as 
millimeter  wave,  infra  red,  and  visible.  Current  activities  on  the  project  are  reviewed 
under  the  following  headings:  (1 )  .-AI— based  generic  image  segmentation  and  object  recogni¬ 
tion;  (2)  evidence-confidence  paradigms  for  image  understanding;  (3)  hierarchical  systems 
theory  for  control  structures;  and  (4;  invariant  methods  in  image  understanding.  4 


OlSTRlBUTION/AVAILABILITY  of  abstract 
iCLASSIFIEO/UNLIMITEO  Z  SAME  AS  RPT  □  OTIC  USERS  □ 


i.  NAVE  OF  RESPONSIBLE  INDIVIDUAL 

Dr.  Robert  N.  Buchal 


00  FORM  1473,  83  APR 


""it. 


21  ABSTRACT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


22b  TELEPHONE  number 
llnciude  .Ar*<j  Code  t 

(202)  767-  4939 


EDITION  OF  1  JAN  73  IS  OBSOLETE. 


22c  OFFICE  SYMBOL 

MM 


84  08  17  066 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


Hierarchical  Multi  sensor 
Image  Understanding 
Annual  Report 

1  October  1983  -  30  September  1984 
Contract  F49620-83-C-0134 
Honeywell  Systems  and  Research  Center 
Minneapolis*  Minnesota  55413 


ABSTRACT 

This  report  describes  the  research  results  on  Honeywell’s  Hierarchical 
Multlsensor  Image  Understanding  program.  Honeywell  Is  developing  a  unified 
framework  for  the  different  hierarchical  levels  of  Image  processing  such  as 
segmentation,  detection,  classification,  and  Identification  of  outdoor  scenes 
and  across  different  sensor  modalities  such  as  millimeter  wave.  Infrared,  and 
visible.  Current  activities  on  the  project  are  reviewed  under  the  following 
headings:  (1)  Al-based  generic  Image  segmentation  and  object  recognition;  (2) 
evidence-confidence  paradigms  for  Image  understanding;  (3)  hierarchical  systems 
theory  for  control  structures;  and  (4)  Invariant  methods  In  Image 
understanding. 


INTRODUCTION 


This  project  Is  concerned  with  the  study  of  a  formal  methodology  for 
multi  sensor  Image  understanding.  It  Is  being  conducted  under  Contract 
F49620-83-C-0134  (AFOSR)  monitored  by  Dr.  Robert  Buchal.  Dr.  King-Sun  Fu 
and  Mr.  M.  Eshera.  both  of  Purdue  University*  are  collaborating  on  some 
aspects  of  generic  scene  segmentation. 

Conventionally*  multi  sensor  systems  are  treated  as  a  set  of  domain 
specific  subsystems  (each  optimized  for  one  sensor  domain  such  as 
Infrared#  visible  or  millimeter  wave)  that  are  Integrated  with  each  other 
only  at  the  final  output  stage  of  processing.  Honeywell  Is  developing  a 
unified  framework  for  the  flow  of  Information  and  control  between 
different  hierarchical  Image  processing  levels  (such  as  gradient*  texture* 
context*  etc.)  and  across  different  sensor  modalities. 

j 

We  are  taking  a  multidisciplinary  approach  to  the  development  of  the 
unified  framework.  We  are  studying  perceptual  and  physical  Invariants* 
developing  and  understanding  of  their  mappings  Into  different  sensory 
domains  at  different  representatl onal  levels*  and  developing  machine 
Intelligence  techniques  for  Image  processing  based  on  the  Invariants. 
Previously*  we  had  developed  Image  pixel  level  concomml ttant  processing 
for  simultaneous  millimeter  wave  and  Infrared  Imagery  and  for  simultaneous 
laser  Intensity  and  range  Imagery.  This  gave  us  some  understanding  of 
Issues  Involved  In  multi  sensor  Image  Information  Integration.  We  also  had 
previously  developed  knowledge  based  feed-forward  control  for  scene 
segmentation  In  different  Infrared  images  with  diverse  characterstlcs. 

We  are  now 

(1)  developing  a  functional  model  for  the  bidirectional  (feedback  and 
feed  forward)  control  of  Information  flow  partially  based  on  human 
visual  and  perceptual  system. 

(2)  further  analyzing  and  comparing  Image  formation  processes  for 
multi  sensor  vision. 

(3)  switching  across  different  sensor  modalities  using  physical  scene 
Invariants  based  on  two-dimensional  normalization  filters. 


(4)  Integrating  the  mode  switching  and  level  transition  frameworks  via 
appropriate  production-rule  structure  with  loose  coupling  of  the 
hierarchical  processing  modules. 

Thus  far  In  the  project*  we 

(1)  have  developed  a  successful  context- Independent  scene 
segmentation  approach  which*  unlike  conventional  approaches*  does 
not  depend  on  specific  object  models. 

(2)  have  developed  a  dynamic  spatio-temporal  knowledge  representation 
method  that  provides  the  knowledge  base  for  multi  sensor  vision 
control . 

(3)  have  developed  a  hierarchical  planner  for  control  of  Information 
flow  to  automatically  determine  the  optimum  sequence  of  Image 
processing  operations  and  parameter  values*  and 

(4)  are  developing  a  novel  evidence  accrual  paradigm  based  on  graphs 
with  attributed  lists  as  nodes  and  Image  processing  operators  as 
arcs. 

This  report  reviews  activities  on  the  project  during  the  period  1  October 
1983  -  30  September  1984.  This  work  Is  covered  under  the  headings  of 
segmentation  and  recognition;  evidence  accrual;  control  of  Information 
flow;  and  Invariant  methods.  The  work  Is  summarized  here  since  It  Is 
covered  In  greater  detail  In  Individual  technical  reports  and  conference 
papers.  (1*  2*  3*  4*  5*  6*  7*  8*  9*  10*  and  11). 

2.  AI-BASED  GENERIC  IMAGE  SEGMENTATION  AND  OBJECT  RECOGNITION 

2.1  Context  Independent  Segmentation  Inference  Enalius-ICISIE? 


Current  generation  Image  understanding  systems  cannot  perform  machine 
vision  tasks  In  a  wide  variety  of  contexts  (environments*  conditions) 
without  parameter  modification.  We  have  demonstrated  the  feasibility  of 
autonomously  processing  digital  Imagery  to  discriminate  regions  which 
correspond  to  components  of  objects  or  areas  of  background  terrain. 
Furthermore*  the  region  discrimination  was  to  be  performed  using  only  the 
Information  content  of  the  original  Image.  Thus*  the  system  could  be 
operated  without  restricting  the  context  of  Input  Imagery.  This  Is  a 


significant  advance  In  the  state  of  the  art.  Algorithms  have  been 
Implemented  In  the  Image  research  laboratory  and  their  performance 
measured  on  a  database  of  tactical  Forward  Looking  Infrared  (FUR)  Imager* 
typical  of  that  used  to  test  target  acquisition  systems. 

The  key  milestones  were: 

1.  Development  of  a  set  of  combining  rules  for  Image  primitives  such 
as  edges*  bright  blobs*  and  contours 

2.  Development  of  a  consistent  set  of  conflict  resolution  rules  to 
resolve  region  conflicts  between  different  "combined"  Images. 

3.  Laboratory  demonstration  of  a  rule-based  region  discrimination 
concept 

Technical  Approach  -  The  first  stage  of  an  Image  processing  system 
extracts  Image  primitives  such  as  edges*  textures*  and  contours.  Our 
approach  Is  to  apply  rules  which  discriminate  the  structural  regions  In 
the  original  scenes  based  on  the  spatial  coincidence  of  the  various  Image 
primitives.  These  rules  depend  only  on  the  primitives  derived  from  the 
Image*  not  on  knowledge  of  expected  scene  content  (e.g.»  tanks  or  road). 
Thus*  the  second  stage  of  CISIE  combines  a  set  of  Image  primitives  to 
yield  a  single  labeled  Image.  The  third  and  final  stage  applies  a  set 
of  conflict  resolution  rules  to  a  set  of  labeled  Images,  each  of  which  has 
been  produced  by  processing  the  same  Image  through  different  combining 
procedures.  Conflict  reolutlon  yields  a  region  discriminated  Image.  The 
approach  Is  Illustrated  In  Figure  1. 

Kev  Accomplishments  -  We  have  succeeded  In  demonstrating  the  feaslbllty  of 
our  technical  approach  for  CISIE.  Reasonable  region  discrimination  was 
performed  on  FLIR  Imagery  whose  content  and  Image  quality  varied  widely* 
without  the  use  of  contextual  Information  such  as  knowledge  of  scene 
objects  or  range  to  ground. 

Milestone  Specific  Results 

o  Milestone  1-Eleven  processes  for  combining  Image  primitives  were 
considered.  Three  combining  processes  were  rejected  because  of 
expense  of  Implementation.  We  experimented  with  the  other  eight. 
Four  of  these  proved  to  require  context-based  Information  In  order 
to  perform  reasonable  region  discrimination.  The  remaining  four 


combining  processes  were  a  homogeneity  operator  which  finds 
regions  of  little  Intensity  change#  an  Inhomogeneity  operator 
which  finds  coarse  boundaries#  Imaging  Sensor  Autoprocessor’s 
(ISA)  texture  boundary  locator  which  finds  changes  In  texture#  and 
ISA’s  prototype  similarity  transformation  which  finds  areas  of 
common  texture.  Figure  2B  shows  the  results  of  these  four 
operators  on  the  original  FLIR  Image  pictured  In  Figure  2A.  These 
procedures  define  the  combination  of  primitives  specified  In 
milestone  1.  Note  that  a  procedural  (algorithm)  rather  than 
declarative  (rule-based)  Implementation  was  chosen  because  of  the 
high  computational  overhead  In  pixel -level  rule-based  decisions. 

Each  of  the  four  combinations  of  primitives  results  In  an 
(possibly  different)  Initial  discrimination  of  the  Image  Into 
significant  areas.  This  results  In  four  labeled  Images. 

Milestone  2-A  set  of  heuristic  rules  codifying  the  behavior  of 
each  of  the  operators#  both  Individually  and  relative  to  each 
other#  was  used  to  formulate  heuristics  for  conflict  resolution 
between  area  discriminations.  Each  area  In  each  of  the  labeled 
Images  was  viewed  as  a  characteristic  (1.e.»  "on-off")  function  In 
order  to  achieve  a  novel#  real-time  Implementation  of  the  conflict 
resolution  rules.  Each  pixel  In  each  labeled  Image  Is  labeled 
"on"  If  It  Is  part  of  a  region#  and  "off"  If  It  Is  part  of  a 
boundary  which  separates  regions.  Thus  each  labeled  Image  Is 
binarized.  The  four  binarized  labeled  Images  are  put  In  adjacent 
bit  planes  In  Image  memory#  forming  a  new  Four-bit  Image.  Now  the 
heuristic  rules  are  coded  Into  a  look-up  table  mapping  the 
four-bit  Image  to  a  binary  Image.  This  performs  conflict 
resolution  In*  real  time.  The  results  of  conflict  resolution  are 
shown  In  Figure  2C. 

Milestone  3-The  CISIE  approach  was  tested  on  eight  digital  FLIR 
Images  which  ranged  In  complexity  from  single  Isolated  targets  In 
a  relatively  homogeneous  field  to  highly  cluttered  Imagery  of 
power  plants  and  general  terrain.  Reasonable  context-free  region 
discrimination  was  demonstrated.  The  process  Is  Illustrated  In 
Figure  2  on  a  FLIR  Image  of  a  power  plant. 


In  addition  to  the  research  being  conducted  at  Honeywell  under  this 
contract#  world  known  Professor  King-Sun  Fu  and  his  student  Mr.  Mohamed 
Eshera#  both  from  Electrical  Engineering  Department  at  Purdue  University# 
are  also  Involved  In  Investigating  generic  context  Independent  Image 
segmentation  via  Attributed  Relational  Graphs  (ARG).  Further  details  are 
contained  In  (12#  13). 

2.2  Al-based  Feedback  Reseomentatlon 

We  have  found  that  In  general#  an  Image  cannot  be  totally  segmented  In  a 
single  pass  of  one  segmentor.  Better  segmentation  results  when  the  Image 
Is  first  segmented  at  a  lower  resolution  Into  a  few  regions.  These 
regions  are  then  In  turn  segmented  Into  smaller  regions.  This  process 
continues  until  the  Image  has  been  segmented  Into  homogeneous  regions. 

This  process  can  be  represented  as  a  tree  where  each  node  Is  a  region. 
Branches  form  a  node  show  It  being  made  up  of  smaller  regions.  The  head 
node  of  the  tree  Is  the  entire  Image  are  leaf  nodes  are  the  smallest 
elementry  regions  that  the  Image  Is  made  up  of.  The  size  of  these 
smallest  regions#  and  therefore  the  resolution  of  the  segmentation  depends 
on  the  size  and  type  of  the  objects  we  re  looking  for  In  the  Image.  This 
means  that  In  an  Image  understanding  system  even  the  Image  segmentation 
needs  to  be  driven  by  higher  level  Information  such  as  range  and  the 
system  goals. 

At  each  node  In  the  segmentation  tree  we  have  many  different  segmentation 
operators  to  choose  from.  The  choice  of  an  operator  to  use  Is  based  on 
knowledge  about  the  sensor*  the  goals  of  the  system  at  a  given  time,  and 
knowledge  about  the  behavior  of  the  different  operators  on  different  Input 
conditions.  It  Is  also  possible  to  choose  more  than  one  operator  at  a 
given  node.  This  creates  parallel  branches  In  the  tree  and  multiple 
representations  for  a  region.  These  multiple  representations  give  more 
Information  about  the  region  that  can  be  used  when  classifying  the  region. 

Two  examples  are  described  of  how  this  segmentation  tree  Is  created.  To 
keep  the  examples  simple  only  two  different  operators  were  used. 

The  first  example  Is  a  FLIR  Image  of  a  power  plant.  Figure  3  shows  the 
segmentation  tree  for  this  Image.  The  Image  was  first  preprocessed  to  do 
noise  cleaning.  The  type  of  preprocessing  used  depends  on  the  type  of 
sensor  and  on  actual  measurements  from  the  original  Image.  Operator  1 


yielded  a  poor  segmentation  while  operator  2  separated  the  Image  Into  good 
regions.  These  regions  were  then  cut  out  of  the  original  preprocessed 
Image  and  segmented  further  using  the  two  operators.  The  numbers  in  the 
I  tree  nodes  correspond  to  the  picture  number.  See  pictures  1-10. 

Our  second  segmentation  example  shows  a  FLIR  Image  of  a  multi-lane  highway 
with  vehicles  on  It.  The  segmentation  tree  for  this  Image  Is  shown  In 
I  Figure  4.  After  preprocessing,  the  Image  was  segmented  at  a  low 

resolution  to  find  the  general  regions  of  Interest  shown  In  picture  13. 
Picture  5  shows  one  of  these  regions  cut  out  and  picture  6  shows  the 
segmentation  of  this  region  at  a  higher  resolution.  The  other  side  of  the 
I  segmentation  tree  shows  the  same  type  of  processing  on  different  regions 

of  Interest  found  by  operator  2.  See  pictures  11-26. 

2.3  Dynamic  Scene  Analysis  Inference  Engine  (PSA) 

I 

Conventional  Image  processing  techniques  deal  mainly  with  the  detection, 
extraction  and  recognition  of  objects,  often  by  operating  In  a  local  area 
around  the  target.  They  fall  short  of  utilizing  spatial,  temporal, 
j  relational  and  In  general  global  Information  available  In  the  scene. 

To  demonstrate  the  existence  of  this  Information  In  a  dynamically  changing 
scene  and  to  appreciate  the  benefits  to  scene  anlysls,  we  have  synthesized 
|  a  sequence  of  Frames  1  through  6,  on  our  SYMBOLICS  3600  LISP  machine. 

For  Instance  Frames  1  and  2  suggest  that  the  three  tanks  will  cross  the 
bridge  and  the  segmentor  operators  and  parameters  (edge  operators, 
background  estimators,  thresholds,  etc.)  should  be  chosen  according  to  the 
bridge  contrast,  texture,  noise,  etc.  However,  Frames  5  and  6  strongly 
Indicate  that  the  tank  convoy  Is  turning  upstream  and  therefore  the 
segmentor  should  be  directed  for  tanks  In  terrain  segmentation.  This 
results  In  more  robust  target  segmentation  and  better  scene  understanding, 
as  It  can  "reasonably"  be  expected  that  the  second  and  third  tank  will 
follow  the  fist  tank. 
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DSA  Is  designed  to  exploit  the  synergistic  benefits  of  these  aspects  In 
scene  understanding  through  a  combination  of  reasoning  and  Inferenclng 
techniques.  The  reasoning  process  Is  modeled  after  the  expert's 


9 


sequential  steps  In  understanding  a  scene  from  low  to  high  level  entitles* 
through  the  representation  of  operative  knowledge  by  a  frame/ production 
rule  structure.  The  Inferenclng  Is  designed  to  enhance  the  robust  users 
of  the  system  by  handling  Incomplete  Information*  requesting  missing 
Information  and  Identifying  Incorrect  Information.  This  could  for 
Instance  correspond  to  lowering  the  thresholds  to  search  for  a  "suspect" 
road  that  the  Initial  segmentation  missed.  The  system  also  Includes  query 
and  explanation  facilities  for  the  man- In- the- loop  mode. 

EVIDENCE-CONFIDENCE  PARADIGMS  AND  INFORMATION  FUSION 

In  a  distributed  sensor  environment  symbolic  Information  fusion  Is  the 
Integration  of  partial  knowledge  obtained  from  each  sensor  to  arrive  at  a 
complete  sensor-derived  representation.  It  uses  model-based 
representation  of  targets  stored  In  the  knowledge  base  and  Is  guided  by  an 
Inference  engine  In  the  accrual  of  evidence  and  matching.  The  flow  of 
Information  during  symbolic  Information  fusion  Is  depicted  In  Figure  5.0. 
The  outcome  of  the  symbolic  Information  fusion  Is  the  full  Identification 
of  objects  In  the  scene. 

The  Inference  engine  performs  the  reasoning  process  that  Involves  the 
matching  of  two  or  more  representations  to  Identify  their  differences  and 
similarities  and  also  updates  the  confidence  measures  of  the  different 
possible  targets  and  their  components.  These  two  functions  are 
accomplished  by  a  semantic  net  comparator  and  an  evidence  accrual  module. 

Identifying  the  best  match  between  the  sensor-derived  representation  and 
the  model -based  representation  previously  stored  In  the  knowledge  base 
provides  the  following: 

o  Object  recognition  -  Identifies  the  object  type  from  the  sensor 
derived  representation. 

o  Directed  rederivation  of  target  representation  -  the  Inference 
engine  Is  used  to  cue  rederivation  of  the  sensor  derived 
representation  so  as  to  Improve  the  level  of  confidence  In  the 
object  recognition. 


o  Occlusion  prediction  -  postulates  a  partial  sensor  derived 

representation  to  predict  missing  or  occluded  components.  This 
enhances  the  recognition  of  occluded  objects. 

Matching  tree  data  structures  Is  generally  carried  out  through  classical 
search  methods  and  hypothesize- test  paradigms.  Classical  search  methods 
Include  depth  first#  breadth  first  and  A*  optimal  search;  these  methods 
are  well  documented  In  [143.  The  hypotheslze-test  paradigm  can  be 
performed  In  either  forward  chaining  or  backward  chaining  modes.  The 
forward  chaining  Is  also  called  data  driven  or  bottom-up  and  the  backward 
chaining  Is  called  the  event  driven#  or  top  down. 

As  an  example  consider  a  particular  target  such  as  a  tank.  In  forward 
chaining#  components  of  the  sensor  derived  representation  are  matched  with 
the  model -derived  representation  trunk#  tread#  and  engine.  If  the  match 
succeeds#  then  the  componet s  are  grouped  and  named  body.  Turret  and 
barrel  are  then  Identified.  Then  the  sensor-derived  representation  Is  a 
tank  Is  true#  or  vice  versa.  Heuristics  can  be  applied  to  facilitate  a 
fast  match.  Honeywell  Is  Investigating  both  approaches. 

The  matching  mechanism  first  constructs  a  network  fragment,  representing  a* 
sough t-for-object#  e.g.#  a  tank,  and  then  matches  the  network  fragment 
against  the  network  data  base  to  see  If  such  an  object  exists.  For 
example#  If  a  tank  Is  sought#  the  fragment  network  depicted  In  Figure  6 
will  be  generated  and  the  components  will  be  matched.  The  matches  wll 
make  Inferences  to  create  extended  network  structures#  e.g.#  trunk# 
engine#  and  tread  during  the  matching  process. 

The  criteria  In  evaluating  the  matching  technique  we  are  developing  are: 

1.  Capability  to  recognize  objects  and  reject  clutter  with  partial 
Information. 

2.  Capability  to  provide  resegmentation  direction. 

3.  Capability  to  determine  occlusion. 


In  any  matching  technique*  a  similarity  measure  Is  used  to  determine  the 
best  match  between  the  sensor-derived  representation  and  one  the 
model-derived  representation.  Commonly  used  similarity  measures  are 
Euclidian  distance*  mean  square  error*  and  Bayesian  probabilities.  These 
similarity  measures  work  well  In  statistical  pattern  recognition*  but 
their  applicability  In  partial  knowledge  matching  Is  limited.  In 
artificial  Intelligence  research.  Inexact  reasoning  has  proved  Its 
useful 1  ness  In  medical  diagnosis*  chemical  analysis*  and  natural  language 
understanding.  Subjective  Bayesian  models  have  been  used  In  expert 
systems.  In  particular,  Shortllffe,  and  Buchanan  [153  have  devised  a 
method  for  Incremental  accrual  of  classification  confidence  which  Is  based 
on  confirmation  theory.  The  theory  assumes  that  one  can  formulate 
approximations  for  a  priori  and  conditional  probabilities  by  using  them  to 
determine  measures  of  "belief"1  and  "disbelief".  These  belief  measures  are 
In  turn  used  to  define  measures  of  confidence  and  rules  for  Incrementally 
updating  both  the  belief  and  confidence  measures  (see  Figure  7). 

The  belief  and  disbelief  measures  as  they  are  Implemented  In  expert 
systems,  are  not  spatial  adaptive.  That  Is*  once  the  evidence  for  belief 
Is  accrued.  Its  significance  never  changes  regardless  of  the  outcome  of 
other  spatially  located  sensors.  Honeywell  Is  extending  the  belief  and 
confidence  measures  to  Incorporte  the  Incremental  evidence  provided  by  the 
distributed  sensors.  Such  a  framework  would  have  the  potential  for 
providing  a  unified  Inferenclng  framework  that  can  work  with  partial 
representations  and  provide  direction  for  rederiving  the  sensor-derived 
representatl on. 

Sensor-Derived  Representation  (SRD)  -  At  every  step*  the  semantic  net 
comparator  produces  Intormatlon  that  validates  or  Invalidates  previous 
evidence  obtained  from  the  sensors.  The  sensor-derived  representation*  as 
a  component  In  the  distributed  sensor  target  recognition  system*  maintains 
and  updates  the  values  of  confidence  measure  for  each  of  the  possible 
targets*  e.g.,  truck,  tank.  Jeep,  etc.  To  accomplish  this  task  the  SDR 
makes  a  copy  of  the  model-based  representations  from  the  knowledge  base 
for  each  of  the  candidate  targets.  Figure  6  depicts  the  Initial  state  of 
the  sensor-derived  representation.  The  graphs  Initially  contain  zero 
confidence  measures  for  all  the  targets  and  their  components.  As  the 
Interence  engine  obtains  Information  from  the  2-D  views  It  builds  or 


accrues  evidence  for  each  of  the  target  graphs.  For  example*  Figure  8 
displays  the  graph  for  a  tank  with  a  total  confidence  measure  of  0.492. 
Table  1  Is  an  example  of  how  the  SDR  updates  the  confidence  measures  as 
the  2-D  views  are  analyzed.  The  distributed  sensor  system  arrives  at  a 
specific  target  Indentlf Icatlon  whenever  the  target’s  confidence  measure 
exceeds  a  suitable  threshold  or  It  exhausts  the  available  sequence  of  2-D 
views  available*  In  which  case  It  chooses  the  target  with  the  highest 
confidence  measure  and  provides  suitable  warnings. 

The  sensor-derived  representation  evolves  as  more  Information  about  the 
target(s)  Is  obtained  by  the  different  Images  or  derived  by  the  Inference 
engine.  Honeywell's  unique  approach  to  symbolic  Information  fusion  Is 
based  on  coordinating,  updating,  and  validating  the  attributes  of  targets 
to  achieve  a  parsimonious  representation.  Redundancies  and  conflicts  are 
resolved  by  the  Inference  engine  at  the  fusion  level. 


CONTROL  OF  INFORMATION  FLOW 


In  a  recent  work  [4],  we  explored  ways  In  which  fundamental  concepts  In 
artificial  Intelligence  (AI)  can  be  applied  to  Image  understanding  (IU) 
systems.  These  three  basic  areas  were  the  use  of: 

1.  Knowledge  Representation  Levels 

2.  Control  Structures 

3.  Constraints  (both  natural  and  domain-specific). 

The  use  of  difference  representation  levels  Is  an  area  that  Is  now  well 
known  In  the  IU  field.  Several  notable  researchers  have  developed  this 
concept  and  Its  application  to  IU  systems  In  some  detail.  [16*173  This 
Is  an  area*  however*  where  It  may  still  be  somewhat  too  early  to  attempt 
to  classify  and  clarify  the  numerous  types  of  constraints  available*  their 
Interactions*  and  the  ways  In  which  they  can  best  be  Incorporated  Into 
Intelligent  IU  systems. 

In  comparison  with  the  above  two  areas*  the  use  of  control  structures  In 
Image  understanding  systems  stands  out  as*an  area  which  has  not  yet 
received  strong  conceptual  development*  but  yet  Is  ripe  for  Just  such  an 
approach.  For  purposes  of  this  report*  we  will  distinguish  control 
structures  from  knowledge  representation  by  stating  that  representation 
levels  will  be  used  to  store  static  knowledge  at  dlfferenct  levels  of 
refinement  throughtout  the  system.  We  can  Imagine  looking  at  "snapshots" 
of  the  contents  of  the  different  representation  levels  at  different  times 
during  the  processing  In  the  system.  At  any  given  moment*  the  contents  of 
a  representation  level  essentially  portray  a  static  form  of  Information 
regarding  the  objects  In  a  scene  and  their  relationship  with  one  another. 
In  contrast*  the  control  structure  for  an  IU  system  will  contain 
Inherently  dynamic  or  process  knowledge;  It  will  be  the  knowledge  about 
how  to  use  or  operate  the  Information  In  the  different  representation 
levels  In  order  to  generate  further  or  more  refined  knowledge.  These 
definitions*  of  course*  have  been  adapted  from  the  classical  AI  concepts 
for  IU  purposes*  and  thus  may  be  somewhat  different  from  other  points  of 
view  as  a  result. 


The  Nature  of  Hierarchical  Systems  - 

The  concept  of  a  hierarchical  system  may  be  well-defined.  At  the  outset* 
hierarchical  systems  may  be  categorized  as  those  which  have  the  following 
features:  (18) 

1.  There  Is  a  vertical  arrangement  of  the  subsystems. 

2.  The  higher  subsystems  have  the  right  to  Intervene  In  the  actions 
of  the  lower  subsystems. 

3  The  effectiveness  of  the  higher  level  subsystems  depends  on  the 
actual  performance  of  the  lower  levels. 

There  are  three  basic  views  of  hierarchical  systems  .  Each  of  these  may 
be  applied  to  any  given  system*  although  often  one  view  may  contain  more 
usable  Information  than  another. 

A  system  may  be  viewed  as  successive  levels  of  description  or 
abstraction.  These  levels,  or  strata,  all  describe  the  same  system*  but 
each  description  carries  different  Information.  Another  hierarchical  view 
of  a  system  would  be  based  on  levels  of  decision  complexity.  This  view. 

In  which  the  levels  will  be  referred  to  as  layers,  will  prove  quite  useful 
for  AI  applications*  and  so  will  be  discussed  In  some  depth.  A  final  view 
of  systems  Is  based  on  the  organization  of  decision  making  units.  In  this 
view*  where  the  levels  are  referred  to  as  echelons,  the  distinction  Is 
made  on  the  horizontal  relations  of  units,  where  there  should  be  more  than 
one  unit  on  the  lowest  level.  The  layers,  mentioned  aboye*  are 
distinguished  on  the  basis  of  vertical  decomposition  Into  subsystems. 

These  two  concepts  are  quite  similar,  and  will  often  blend  together  In  the 
discussion  which  will  follow. 

With  respect  to  hierarchical  arrangements  of  decision  units  which  comprise 
a  system*  the  following  categories  of  decision-making  systems  can  be 
recognized:  single-level,  single-goal;  single-level*  multi goal  system; 
and  multilevel*  multi  goal  systems.  The  AI/IU  programs  to  which  this 
theory  will  be  applied  will  fall  Into  the  latter  category. 


Communication  between  supremal  and  Inflmal  units  must  be  bidirectional. 

The  supremal  unit  can  signal  downward  to  the  Inflmal  unit*  where  the 
signal  will  represent  Intervention.  This  Intervention  should  specify 
decision  problems  for  Inflmal  units.  Inflmal  units  should  also  be  able  to 
signal  upward  to  the  supremal  units.  Generally*  their  signals  will 
represent  the  status  of  activities  undertaken. 

In  general,  the  supremal  units  have  two  broad  responsibilities  In  dealing 
with  the  Inflmal  units.  The  first  Is  to  Instruct  Inflmal  units  In  how  to 
proceed  by  selecting  for  them  the  rules  and  procedures  to  be  followed. 
This  Is  referred  to  as  "selection  of  a  coordination  mode."  The  second 
major  responsibility  Is  to  influence  the  Inflmal  units  to  change  their 
actions  (If  necessary),  or  to  adjust  the  roles  of  Inflmal  units  In  order 
to  Improve  performance.  The  selection  of  actual  Intervention  or  use  of  a 
control  variable  will  be  referred  to  simply  as  "coordination". 

It  Is  also  necessary  that  units  on  the  same  level  be  allowed  some  form  of 
communication.  The  relationships  between  units  of  the  same  level  will  be 
characterized  the  action  of  a  units  and  by  the  response  of  the  rest  of  the 
system  as  It  Influences  that  units.  This  response  Is  referred  to  as  an 
Interface  Input.  Supremal  units  deternlne  how  Inflmal  units  will  account 
for  the  Interface  Input.  There  are  five  general  ways  In  which  this  can  be 
done.  Summarized  below,  they  are: 

1.  Interaction  Predication  Coordination.  In  which  the  suremal  unit 
specifies  the  Interface  output.  The  Inflmal  units  solve  their 
local  decision  problems  based  on  this  Information  alone  from  the 
other  units,  and  must  assume  that  It  correctly  represents  the 
known  state  of  the  system. 

2.  Interaction  Estimation  Coordinations,  In  which  the  supremal  unit 
specifies  a  range  of  possible  values  which  the  Interface  Inputs 
may  have.  The  Inflmal  units  treats  the  Inputs  as  dlstrlburances 
which  may  assume  any  value  within  the  given  range. 

3.  Interaction  Decoupling  Coordination,  allows  the  Inflmal  units  to 
treat  the  Interface  Input  as  an  additional  decision  variable. 

They  solve  their  decision  problems  as  thought  the  value  of  the 
Interface  Input  could  be  chosen  Independently. 


4.  Load-Tvoe  Coordination#  Is  the  first  case  In  which  the  Inflmal 
units  actually  recognize  the  existence  of  other  units  on  their 
level.  The  supremal  unit  provides  the  Inflmal  units  with  a  model 
of  the  relationship  between  Its  action  and  the  response  of  the 
system. 

5.  Coalltlon-Tvpe  Coordination,  expresses  the  situation  In  which  the 
Inflmal  units  not  ony  recognize  the  existence  of  other  units  on 
their  level#  but  are  allowed  an  Interaction  with  other  units.  The 
form  of  the  Interaction  Is  controlled  by  supremal  units. 

Of  these  approaches#  the  last  Is  the  most  sophisticated  and  comes  the 
closest  to  the  organization  of  human  hierarchical  systems.  However#  It  Is 
quite  complex.  The  first  three  appraoches  are  those  that  would  be  easiest 
to  Implement  In  an  AI/IU  application.  It  Is  worth  noting#  however#  for 
possible  future  reference#  that  the  useful 1  ness  of  communication  between 
Inflmal  units  depends  on  how  the  Inflmal  problems  are  defined.  Some 
problems  benefit  from  communication#  whereas  others  (generally  the 
simplest  sort)  do  not.  Also#  It  can  be  shown  that  the  effect  of  excessive 
communication  between  units  may  have  the  same  effect  as  the  lack  of 
communication  In  leading  to  an  overall  deterioration  of  performance. 

Complex  Hierarchical  Systems 

It  can  be  useful  at  this  point  to  review  the  terminology  Introduced 
previously  to  describe  the  levels  In  a  hierarchical  system.  The  concept 
of  strata  was  Introduced  to  Indicate  the  choice  of  abstraction  layers  that 
could  be  used.  The  concept  of  layers  refers  to  the  vertical  decomposition 
of  a  decision  problem  Into  subproblems#  and  the  concept  echelons  Is  used 
when  there  are  more  than  one  decision  units  on  one  layer.  In  the  previous 
discussion#  the  concept  of  communication  between  units  on  the  same  layer 
Impllclty  referred  to  a  multlechelon  decision-making  system. 

It  Is  possible  that  a  complex  problem  or  situation  would  require  a  complex 
multilayer  hierarchy  to  adequately  represent  the  system.  In  this  case# 
each  of  the  decision  units  In  a  multlechelon  hierarchy  may  use  a 
multilayer  approach  to  solve  their  own#  local  subproblems.  This  concept 
Is  Illustrated  In  Figure  10. 


It  would  also  be  possible  to  decompose  a  larger  system  Into  several 
vertically  arranged  decision  units.  Each  decision  unit  could  be  viewed  as 
both  a  multilayer  hierarchy  and  as  a  multlechelon  hierarchy*  as  Is 
Illustrated  In  Figure  11.  In  this  example*  the  learning  and  adaption 
layer  corresponds  to  an  echelon  with  two  units*  and  the  selection  layer  to 
an  echelon  of  four  units. 

Regardless  of  how  complex  a  hierarchy  becomes*  several  features  will 
remain  constant: 

1.  A  supremal  unit  will  always  be  concerned  with  a  larger  portion  or 
broader  aspect  of  the  overall  system  behavior. 

2.  The  decision  period  of  supremal  unit  will  be  longer  than  that  of 
Its  Inflmal  units. 

3.  The  supremal  units  will  be  concerned  with  the  slower  aspects  of 
the  overall  system  behavior. 

4.  Decisions  and  problems  on  higher  levels  are  less  structured*  will 
contain  more  uncertainties*  and  will  be  more  difficult  to 
formalize  quantitatively. 

Using  MLMGHST  as  a  control  structure  paradigm 

After  the  proceeding  discussion  of  the  nature  of  Multi  Level*  Multi  Goal 
Hierarchical  System  Theory  (MLMGHST)*  we  might  ask  ourselves:  Why  should 
this  apply  to  control  structure  Issues  In  AI/IU  systems? 

The  purpose  of  an  IU  control  structure  Is  to  guide  the  choice  of 
appropriate  algorithms  In  order  to  achieve  certain  goals.  The  control 
structure  of  an  IU  system  Is  both  goal-oriented  and  process  oriented.  It 
Is  easy  to  see  that  In  a  complex  system*  the  hierarchical  ordering  of 
process  knowledge  Into  strategic,  tactical,  and  operational  (algorithmic) 
processes  could  prove  useful.  If  this  Is  so*  then  the  design  schema 
presented  before  could  prove  a  useful  basis  for  system  design.  Further* 
the  five  types  of  Interlevel  communication  protocols  presented  earlier 
could  prove  useful,  not  only  In  designing  an  Initial  system*  but  also  In 


planning  for  future  system  development  and  upgrading,  by  successively 
modifying  the  communication  protocols  Into  more  complex  structures. 

That  MLMGHST  Is  primarily  useful  for  structuring  control  In  a  system, 
rather  than  designing  the  representation  levels  or  the  use  of  constraints, 
can  be  shown  by  examining  each  of  these  systems  components  separately. 
Although  the  commonly  suggested  structrure  of  IU  knowledge  representation 
levels  Is  hierarchical,  each  level  may  be  viewed  as  a  means  of  structuring 
the  relatively  static  knowledge  In  each  Image  (Figure  9).  Algorithms  are 
used  to  pass  Information  from  one  level  to  the  next.  Although  there  Is 
some  possibility  for  using  the  Ideas  of  self-organizing,  learning  and 
adaptation,  and  selection  levels  In  designing  more  complex  representation 
levels.  It  appears  that  the  most  Immediate  and  profitable  use  of  MLMGHST 
Is  to  apply  It  to  control  structure  which  contains  the  process  knowledge 
for  the  system.  In  this  way,  we  would  have  to  view  the  multilayer  control 
sturcture  as  a  construct  superimposed  on  the  representation  levels  and 
algorithms.  The  algorithms  or  operators  would  be  In  the  same  plane  as  the 
representation  levels,  but  we  would  have  to  view  the  strategic  and 
tactical  control  levels  as  projecting  out  of  the  plane,  forming  a  3-D 
construct. 

The  use  of  MLMGHST  Is  similarly  more  suited  for  control  than  for 
structuring  the  use  of  constraints.  There  are  two  types  of  constraints 
which  are  currently  In  use  In  AI/IU  system;  natural  and  domain-specific. 
The  natural  constraints  which  are  currently  In  use  are  low  level  (e.g.» 
"surfaces  tend  to  be  continuous"),  and  hence  are  Incorporated  Into  the 
system  at  the  algorithmic  level.  Higher  level  natural  contralnts  are  not 
well  evolved. 

On  the  other  hand*  domain-specific  constraints  embody  knowledge  about 
subjects  or  attributes  that  are  likely  to  pertain  to  the  Image  being 
used.  This  knowledge  often  expresses  relationships  about  the  observable 
features  or  regions.  This  Is  often  stored  In  knowledge  bases  which  are 
separate  from  but  associated  with  the  representation  levels.  Often,  the 
use  of  domain-specific  constraints  Is  crucial  to  the  success  of  an  IU 
system,  but  since  the  constraints  are  unique  for  each  system,  examples  of 
their  use  Is  defered  until  later.  The  MOLGEN  system  [193  Is  an  example  of 
how  domain-specific  constraints  may  be  used  as  an  Integral  part  of  the 


classical  expert  system  with  hierarchically  organized  process  knowledge  or 
control . 

Given  that  MLMGHST  Is  best  suited  for  application  to  organizing  the 
control  structure  of  a  system,  It  Is  possible  to  further  specify  those 
types  of  systems  for  which  It  can  best  be  used.  First,  the  system  should 
be  sufficiently  complex  so  that  the  organization  of  processes  Into 
strategic,  tactical  and  algorithmic  groupings  seem  natural.  This  could 
mean  that  a  large  number  of  algorithms  and/or  algorithm  pathways 
connecting  representation  levels  should  be  available.  A  second  aspect  of 
a  sytem  that  would  benefit  from  this  type  of  control  Is  that  the  system 
should  be  designed  to  handle  multiple  (but  not  necessarily  simultaneous) 
goals.  For  example,  an  IU  system  for  robot  bin-picking  would  benefit  from 
this  type  of  control  structure  If  there  Is  more  than  one  ■type  of  part  for 
which  It  will  search.  An  IU  system  for  an  autonomous  vehicle  would  have 
the  multiple  goals  of  needing  to  characterize  both  objects  and  terrain. 
Different  types  of  representation  could  be  required  for  the  different 
types  of  objects  and  the  terrain,  necessitating  a  strategic/tactical 
approach  to  determining  which  of  the  several  representation  schemas  should 
be  employed.  As  a  third  example,  multi sensor  IU  systems  will  need  to 
determine  which,  among  several  sensors  or  combination  of  sensors,  would  be 
most  efectlve  under  different  conditions,  or  when  searching  for  different 
types  of  objects.  Each  of  these  example  areas  present  a  compelling  need 
for  designing  robust  control  structures  which  are  more  sophisticated  than 
those  currently  In  use. 

5.  INVARIANT  f€7H0DS  IN  IMAGE  UNDERSTANDING 

5.1  Qb.lect  recognition  and  scene  parametric  analysis 

An  analysis  of  the  effect  that  scene  parameters  have  on  object 
recognlzablllty  Is  fundamental  to  the  understanding  of  distributed  sensor 
phenomenology.  Accordingly,  we  started  out  with  a  parametric  analysis 
which  Includes  the  following: 


o  Evaluation  of  the  number  and  location  of  3-D  sample  points  which 
are  necessary  for  the  discrete  representation  of  objects  and 


o  Investigation  of  the  effect  of  sensor  aspect  angle*  depression 
angle*  range  to  an  object  and  sensor  angular  separation  on  the 
recognlzlbll Ity  of  the  objects. 

The  Importance  of  the  sampling  point  selection  of  the  discrete 
representation  of  any  object  becomes  apparent  once  one  Is  reminded  that  In 
all  the  past  attemps  at  3-D  object  recognition*  assumptions  have  been  made 
about  the  availability  of  3-D  points  which  completely  represent  an 
object.  These  points  usually  are  obtained  arbitrarily  or  Imposed  on  the 
3-D  objects  by  projecting  a  rectangular  mesh  over  their  surfaces.  No 
attempt  has  been  made  to  justify  these  approaches  or  study  the  effect  of 
point  selection  on  the  recognition  of  the  object. 

The  effect  of  the  angle  of  separation  between  the  sensors*  aspect  and 
depression  angles  as  well  as  object  range  on  the  recognlzablllty  of  the 
object  has  also  not  been  addressed  In  the  past.  It  Is  Intuitively 
apparent  that  as  the  angle  of  separation  Increases*  more  of  the  object  Is 
viewed*  hence  more  Information  Is  obtained  for  recognition.  While  such 
Intuitive  feelings  are  helpful*  a  quantitative  parametric  analysis  Is 
essential  for  a  thorough  phenomenological  understanding. 

In  order  to  permit  a  parametric  analysis  one  needs  a  suitable  figure  of 
merit  of  target  characterization*  l.e.*  a  barometer  of  target 
recognlzablllty.  This  target  attribute  has  to  be  capable  of  adequately 
characterizing  the  target  object.  One  can  then  deduce  a  parametric 
analysis  by  studying  how  this  attribute  degrades  from  Its  Ideal  value  as 
the  parameters  are  varied. 

One  such  attribute  which  we  have  used  Is  a  3-D  moment  Invariant  [20].  In 
this  method*  a  set  of  functions  that  may  be  used  to  represent  3-D  objects 
Independent  of  size  and  coordinate  system  Is  derived.  Knowing  the  proper 
number  of  discrete  points  and  their  position  on  the  object  Is  a 
prerequisite  for  this  calculation*  and  hence  the  Importance  of  a 
theoretical  analysis  of  the  3-D  sampling  phenomenon.  The  next  logical 
step  Is  to  determine  the  effect  of  undersampling  on  the  calculation  of  3-0 
moment  Invariants.  This  undersampling  can  be  due  to  the  fact  that  based 
on  the  sensors’  geometrical  location  and  orientation  only  part  of  object 
may  be  ’’seen"  and  so  the  sample  points  may  not  "cover"  the  whole  3-D 


target  surface.  Likewise,  the  other  scene  parameters  may  also  be  varied 
and  their  effect  on  target  recognlzablllty  analyzed  y  observing  the  3-D 
moment  Invariants.  We  now  provide  a  brief  review  of  the  moment 
Invariants. 

3-D_Moment  Invariants  -  In  order  to  recognize  any  3-D  object  Indepent  of 
size,  position,  and  orientation,  one  must  obtain  measurements  which  convey 
the  Invariant  attributes  of  the  object.  The  use  of  three-dimensional 
moment  Invariants  provides  an  excellent  representation  of  3-D  objects 
[20] . 

The  three-dimensional  central  moments  of  order  p  +  q  +  r  of  a  density  g 
(zi»  X2»  X3),  are  defined  as: 


ffj  x1Px2<Ix3rg(x1,x2,x3)dx1dx2dx3 


where,  for  the  sake  of  simplicity,  the  centroid  Is  assumed  to  be  at  the 
orl  gin. 

It  has  been  shown  [20]  that  for  quadratic  surfaces,  which  form  a  special 
but  important  subset  of  general  ternary  quantlcs,  a  set  consisting  of  two 
moment  variables  can  be  derived  with  the  following  results: 
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Analysis  Procedure  -  For  the  parametric  analysis,  one  needs  to  study  the 
variation  of  moment  Invariants  from  their  ideal  value  for  each  object 
class,  as  the  sensor  parameters  are  varied,  the  following  steps  will  be 
taken  for  the  analysis: 

1.  The  object  Is  first  encoded  Into  a  computer  by  using  a  scale  model  of 
the  object  and  any  one  of  the  various  3-D  coordinate  machine  devices 
that  are  commercially  available  [213. 

2.  In  order  to  study  the  effects  of  3-D  sampling.  In  analogy  with  1-D  and 
2-D  cases,  we  first  select  a  set  of  suitable  cut-off  frequencies  (since 
the  Fourier  transform  of  any  finite  object  Is  Infinite  [223)  using 
appropriate  criteria,  e.g..  maximum  volume  criteria  which  leads  to  the 
selection  of  the  first  zero  crossings.  This  then  serves  as  the 
baseline  for  studying  the  effects  of  undersampling,  which  can  be 
simulated  on  the  computer. 


3.  In  order  to  study  the  effects  of  the  scene  parameters*  the  desired  view 
of  the  object  Is  obtained  by  means  of  computer  software  such  as 
commercially  available  MOVIE-BYU  [23]  which  can  give  the  desired  view 
of  the  object  when  range*  depression  and  aspect  angles  are  specified. 

4.  The  coordinates  of  the  points  that  are  viewed  by  the  sensor  are 
evaluated  by  changing  one  parameter  at  a  time  while  keeping  other 
constant. 

5.  The  3-0  moment  Invariants  of  the  target  as  It  Is  viewed  by  the  sensors 
are  then  calculated. 

6.  The  moment  Invariant  sets  are  then  obtained  for  other  objects  under 
Identical  goemetrlcal  conditions. 

7.  A  distance  measure  Is  defined  In  terms  of  sets  of  moment  Invariants  of 
the  objects  as  functions  of  the  various  parameters.  This  provides  a 
quantification  of  the  objects  recognlzabll Ity. 

To  obtain  a  phenomenological  understanding  of  some  of  the  parameters  that 

have  effects  on  object  recognlzabll Ity  we  will  further  Investigate  the 

following: 

1.  The  effect  of  the  number  of  sampling  points  and  their  location  on  the 
object  on  the  recognlzabll Ity  of  the  object* 

2.  The  effects  of  the  sensors'  angle  of  separation  on  the  object 
recognition  ability* 

3.  The  effects  of  the  depression  and  azimuth  angles  of  the  sensors  on  the 
object  recognition  performance*  and 

4.  The  effect  of  the  range  of  the  object  from  each  sensor  on  the 
recognition  performance. 


We  present  a  summary  of  an  Image  representation  that  uniquely  encodes  the 
Information  In  a  gray-scale  Image*  decouples  the  effects  of  Illumination* 
reflectance*  and  angle  of  incidence*  and  Is  Invariant*  within  a  linear 
shift*  to  perspective*  position*  orientation*  and  size  of  arbitrary  planar 
forms.  A  detailed  description  of  our  research  results  can  be  found  In  a 
separate  Technical  Report  (11). 

The  challenge  of  the  visual  Information  problem  stems  from  the  fact  that 
the  Interpretation  of  a  3-D  scene  from  a  single  2-D  Imau®  Is  confounded  by 
several  dimensions  of  variability.  Such  dimensions  Include  uncertain 
perspective,  position,  orientation,  and  size,  (  pure-geometric 
variability)  along  with  sensor  mode*  object  occlusion,  and  non-uniform 
Illumination.  Vision  system  must  not  only  be  able  to  sense  the  Identity 
of  objects  despite  this  variability*  but  they  must  be  able  to  explicitly 
characterize  such  variability.  This  Is  so  because  the  variability  In  the 
Image  formation  process  (particularly  that  due  to  geometric  distortion  and 
varying  angle  of  Incident  Illumination)  Inherently  carries  much  of  the 
valuable  Information  about  the  Imaged  scene.  Consider  human  vision  for 
the  moment.  In  spite  of  the  complications  Introduced  by  geometric 
distortion.  It  Is  precisely  the  "unraveling"  of  such  distortions  that 
enables  a  human  to  readily  perceive  the  three-dimensionality  of  any  static 
2-D  Image— be  It  a  single  face  of  a  Necker  cube*  or  a  15th  Century 
painting  of  Da  Vinci.  Indeed*  humans  seem  capable  of  unraveling  the 
physical  and  geometric  distortions  In  an  Image  almost  as  precisely  as  the 
physics  and  geometry  of  the  world  created  In  the  first  place. 

Contrasted  with  the  apparent  ease  and  elegance  of  human  visual 
Interpretation  of  scene  geometry*  current  vision  algorithms  are  clearly 
lacking.  It  Is  becoming  Increasingly  clear  that  much  of  the  blame  lies 
with  conventional  Image  representations.  Good  image  representations  must 
satisfy  a  number  of  requirements  which  seem  to  be  mutually  Incompatible. 

First,  they  should  be  simultaneously  compact  and  complete  In  their 
representation  of  gray-scale  Image  Information.  Compactness  Is  synonymous 
with  ease  of  computation  and  efficient  use  of  memory.  Completeness*  on 
the  other  hand*  Implies  that*  If  desired*  one  could  fully  reconstruct  Ihe 
original  Image  from  which  the  representation  Is  derived.  Secondly*  Image 


representations  must  provide  good  Intra-object  clustering  and  Inter-object 
separability  Independent  of  Image  distortion  while  at  the  same  time 
preserving  Information  about  pattern  distortion.  No  conventional  Image 
representation  satisfies  all  of  the  above  conditions.  Though  many 
conventional  representations  claim  compactness#  most  do  not  make  a 
credible  attempt  to  decouple  Information  about  object  identity  from 
Information  about  viewing  geometry  and  Illumination#  nor  do  current 
representations  fully  exploit  the  abundance  of  gray-scale  Information  In 
an  Image.  In  contrast#  Invariant  "analogical"  Image  representations  # 
such  as  that  advocated  In  this  project#  can  satisfy  the  above 
requl rements. 


Decoupling  Multiplicative  Processes  In  Image  Formation. 


For  the  case  of  an  Ideally  diffusing  surface#  an  Image  can  be  modeled  as  a 
product  of  three  Independent  signal  componets: 


f(x»y)  *  1(x»y)  •  r(x»y)  •  cos  t 


Where  1(x,y)  Is  the  Illumination#  r(x#y)  Is  the  reflectance,  and  t  Is  the 
angle  of  Incidence  of  the  Illuminating  light  (24).  Let's  assume  that  any 
additive  noise  that  Is  present  has  small  magnitude  relative  to  the  above 
three  componets  of  the  signal.  Then  there  Is  a  well-known  and  effective 
method  known  as  homomorphic  filtering  that  allows  one  to  Individually 
filter  out  such  multiplicative  signal  components  when  certain  reasonable 
conditions  are  met.  This  method  Is  briefly  reviewed  below. 


Suppose  one  takes  the  logarithm  of  our  function  f(x»y)  as  defined  above. 
There  results  the  so-called  "density  Image". 


In  f(x»y)  *  In  1(x»y)  +  In  r  (x»y)  +  In  cos  t  (2) 

Hence  the  product  becomes  a  sum  of  three  density  componets  In  1#  In  r#  and 
In  cos  t.  If  these  three  additive  density  componets  have  Fourier  spectra 
which  Overlap  very  little  In  their  regions  of  significant  energy#  then 
linear  filtering  can  be  used  to  extract  any  one  of  them.  By  taking  the 
exponent  of  the  extracted  componet#  one  obtains  the  corresponding 
multiplicative  signal  component  which  appeared  In  the  original  expression 


for  the  Image  f(x»y).  Homomorphic  filtering  has  been  applied  to  enhance 
Imagery  by  selectively  filtering  out  the  slowly  varying  Illumination 
component  with  very  Impressive  results  (25). 

The  success  of  homomorphic  filtering  clearly  stems  from  the  fact  that  the 
Fourier  transform  of  a  density  Image  often  has  the  effect  of 
"representatlonally"  decoupling  multiplicative  Image  componets.  If  Image 
pattern  recognition  (rather  than  Image  filtering)  could  be  based  on  such  a 
Fourier  representatl on#  then  perhaps  the  effects  of  surface  reflectance# 
Illumination#  and  angle  of  Incidence  could  be  decoupled.  This  In  turn 
could  lead  to  methods  for  form  recognition  that  are  Insensitive  to  varying 
Illumination  conditions.  Unfortunately#  as  Is  well  know#  neither  the 
Fourier  transform#  nor  Its  (phaseless)  power  spectrum#  have  proved  to  be 
especially  useful  for  Image  pattern  recognition.  However#  there  are 
alternative  representations  -  namely#  simultaneous  spatial/ spatl  al- 
frequency  representations  -  which#  like  the  Fourier  spectrum#  can  provide 
the  decoupling  of  multiplicative  Image  density  componets  and#  at  the  same 
time,  overcome  the  classical  shortcomings  of  the  Fourier  spectrum  as  an 
Image  representation. 

Simultaneous  Representations  of  Space  and  Spatial  Frequency. 

Vision  researchers  have  traditionally  emphasized  the  Importance  of  either 
the  spatial  or  the  spatial-frequency  domain#  but  not  both.  This  should  be 
contrasted  with  conventional  representations  of  acoustic  signals  where 
simultaneous  time/ frequency  representations  (e.g.  the  spectrogram)  have 
long  been  used.  Nonetheless#  vision  researchers  have  very  recently  begun 
to  express  an  active  Interest  In  simultaneous  spatl al-spatlal-frequency 
Image  representations  (26#  27#  28,  29#  30).  Such  representations  can 
provide  for  Improved  separability  of  Information  characterizing  visually 
relevant  patterns  and  thqy  are  also  compatible  with  the  representation  of 
gray-scale  characteristics  ranging  from  textures  to  object  contours  (27). 
Furthermore#  simultaneous  spatlal/spatlal-frequency  Image  representatons 
can  be  used  to  decouple  the  effects  of  Illumination,  object  reflectance# 
and  angle  of  Incidence. 


The  Wlgner  Distribution  (WD). 


Two  simultaneous  spatlal-spatlal-frequency  representations  have  recently 
received  much  attention.  They  are  the  Gabor  function  representation  (31* 
32,  33)  and  the  Wlgner  distribution  (26*  27*  28*  29,  30*  34).  (Note  that 
we  are  here  concerned  only  with  4-D  directionally  selective 
representations;  this  excludes*  for  example*  Marr’s  3-D  DOG  representation 
(24)  and  others  like  It  which  do  not  have  a  fourth  dimension  covering 
spatial-frequency  angle.  As  discussed  elsewhere  (30)  the  Wlgner 
distribution  provides  higher  simultaneous  resolution  than  Is  possible 
using  the  Gabor  functions.  In  fact  as  discussed  In  (35),  every 
simultaneous  representation  ever  proposed  can  be  expressed  In  terms  of 
averages  of  the  Wlgner  distribution  over  Its  Independent  spatial  and 
spatial-frequency  variables.  Like  the  Fourier  transform*  the  Wlgner 
distribution  (WD)  Is  not*  In  general*  a  computable  function  since  this 
would  require  the  evaluation  of  infinite  Integrals.  However,  just  as  the 
Fourier  transform  has  proved  to  be  an  elegant  and  convenient 
transformation  with  which  to  handle  many  problems  In  the  spatial-frequency 
domain,  the  WD  Is  an  Invaluable  tool  for  problems  simultaneously  Involving 
the  spatial  and  spatial-frequency  domains.  As  with  the  Fourier  transform, 
the  WD  can  be  used  In  practice  by  employing  an  approximation  obtained  via 
finite  Integration  windows.  A  particularly  attractive  approximation  to 
the  WD,  denoted  as  the  "somposlte  jiseudo  Ml gner  distribution"  (CPWD)  has 
been  introduced  (30). 

The  WD  and  Multiplicative  Signal  Component  Separability 

The  essence  of  the  homomorphic  filtering  was  to  transform  the  Image 
density  function  (l.e.  the  logarithm  of  the  sensed  Image  function)  Into 
the  spatial -frequency  domain  In  order  to  decouple  the  Image  components  due 
to  Illumination,  reflectance*  and  angle  of  Incidence.  To  simplify  our 
calculations*  we  assume  a  two  component  model  Including  only  Illumination 
and  reflectance: 

f(x,'y)  *  1(x»y)  •  r  (x»y)  <3) 


Taking  the  logarithm: 


In  f(x#y)  *  In  1(x#y)  +  In  r(x»y)  (4) 

and  computing  the  WO  of  both  sides  yields: 

wln  f(x*y»u#v)  =  W-j n  ^(x,y*u»v)  +  W-|n  r(x#y,u*v) 

+  2  wln  1,  In  r{x»y*u»v)‘  (5) 

where  (x#y)  and  (u*v)  specify  the  spatial  and  spatial-frequency  domains* 
respectively.  The  third  component#  representing  the  cross-WIgner 
distribution  (6)  occurs  because  computing  the  WD  Involves  a  nonlinear 
correlation  operation.  Nevertheless*  If  the  auto-WD's  of  the  functions  In 
1  and  r  do  not  overlap  substantially  In  space  and  spatial  frequency*  then 
approximations  to  the  cross-WD  contribution  will  In  practice  contain 
negligible  energy.  (This  non-obvlous  fact  follows  from  considering  the 
definition  of  the  CPWD  (30)).  Therefore#  as  with  the  Fourier  transform* 

If  regions  of  significant  spectral  energies  of  win  ^  and  Win  r  are 
disjoint*  then  separability  of  the  multiplicative  signal  components  will 
have  been  achieved.  These  observations  generalize  to  the  case  where  angle 
of  Incidence  effects  are  Included;  then  the  right  hand  side  of  equation 
(5)  would  Include  the  auto-WD  of  the  density  function  of  this  component 
along  with  Its  cross-WD's  with  the  Illumination  and  reflectance  density 
functions.  The  Wlgner  distribution#  like  the  Fourier  transform*  can 
therefore  decouple  the  effects  of  Illumination*  reflectance*  and  angle  of 
Incidence  when  the  spectra  of  these  components  are  mutually  disjoint.  We 
describe  In  the  remainder  of  this  report  how  the  Wlgner  distribution  can 
be  employed  to  define  a  unique  Image  representation  that*  In  addition  to 
providing  decoupling  of  multiplicative  Image  components*  also  provides 
Invariance  to  geometric  distortions  Introduced  4>y  the  Imaging  process. 

Invariant  Form  Recognition  In  The  Frontoparallel  Plane. 

As  discussed  earlier*  good  Image  representations  should  not  only  decouple 
the  effects  of  Illumination#  reflectance  and  angle  of  Incidence#  but  they 
should  also  allow  objects  to  be  recognized  Irrespective  of  the  a  priori 
unknown  geometric  distortions  Introduced  by  the  Image  formation  process. 

We  begin  this  section  by  discussing  only  methods  for  obtaining  Invariance 


to  linear  transformations  of  non-occl uded*  planar  gray-scale  patterns  In 
front  parallel  view.  The  more  general  problem  Including  perspective 
distortion  Is  reviewed  In  (11).  One  particular  approach  for  obtaining 
Invariance  to  linear  transformations  makes  use  of  the  complex-logarithmic 
(CL)  conformal  mapping.  Seme  researchers  (36*37)  have  advocated  use  of 
the  representation  derived  by  CL  conformally  mapping  the  gray-scale  Image 
function  Itself.  Others  (38*39*40)  have  suggested  that  the  CL  conformally 
mapped  Fourier  power  (or  magnitude)  spectrum  should  be  used.  In  fact* 
neither  representation  seems  entirely  adequate.  The  former  representation 
(CL  mapped  Image  function)  Is  Indeed  Invariant*  jtlthln  a  linear  shift 
(WALS- Invariant) *  to  rotation  and  scaling  of  the  Image  about  a  single 
Image  point.  However*  such  a  representation  Is  not  Invariant  to 
translation  of  an  Image  and  the  effects  of  Illumination  and  reflectance 
are  not  In  any  way  decoupled  from  one  another.  The  second  representation 
mentioned  above*  (Cl  conformally  mapped  power  spectrum  of  an  Image)*  Is 
strictly  Invariant  to  translation  and  WALS-Invarlant  to  rotation  and 
scaling  of  an  Image*  but  It  does  not  uniquely  represent  an  Image  since 
Fourier  phase  Information  Is  discarded.  As  alluded  to  earlier  In  our 
discussion  of  multiplicative  signal  separability*  the  loss  of  phase 
Information  makes  Fourier  spectra  especially  Ill-suited  for  Imagery 
containing  clutter  or  multiple  objects  to  be  recognized.  Since  the 
WALS-Invarlance  properties  of  the  above-mentioned  representations  arise 
exclusively  from  using  the  CL  conformal  mapping*  one  Is  naturally  led  to 
consider  whether  this  mapping  could  be  used  to  develop  considerably  more 
robust  Image  representations  (perhaps  simultaneous  spatl al/ spatial - 
frequency  representations)  which  are  similarly  WALS-Invarlant  to  rotation 
and  scale  changes.  We  show  In  (11)  that  this  Is  possible. 

To  review  our  progress  thus  far,  we  now  have  an  8-dlmenslonal 
representation  which  Is  WALS-Invarlant  to  all  common  geometric  distortions 
of  rigid  planar  forms  and  which*  given  reasonable  assumptions*  Is  also 
Invariant  to  non-uniform  Illumination.  We  briefly  describe  next  how  such 
a  representation  can  be  used  to  actually  perform  Image  pattern 
recognl tl on. 

The  memory  prototype  pattern  characterizing  some  arbitrary  planar  form  Is 
Just  the  4-D  CL  conformally  mapped  WD  of  a  frontoparallel  view  of  that 
form,  where  spatial  domain  mapping  has  been  performed  about  some  arbitrary 


point  (see  Figure  12).  If  that  same  planar  form  Is  ever  "seen”  again*  Its 
presence  Is  detected  by  mathematically  correlating  the  4-D  prototype  with 
the  previously  defined  8-D  Image  representation  W-|n  f  for  all  linear 
shifts  In  a  6-D  hyperspace.  If  the  resulting  6-D  correlation  function 
exceeds  threshold  somewhere*  then  recognition  Is  achieved.  The  location 
of  the  supralhreshol d  peak  In  the  6-0  correlation  space  furthermore 
specifies  the  object  position  (In  distance-normalized  coordinates)  within 
the  object  plane*  the  distance-normalized  size*  the  orientation  within 
object  plane  and  finally*  the  slant  and  tilt  of  the  object  plane  within 
which  the  planar  form  lies  --  all  relative  to  the  fixation  point*  size* 
and  orientation  of  the  pattern  (In  frontoparallel  view)  from  which  the 
matching  template  was  formed. 

We  have  described  an  approach  to  Invariant  Image  pattern  recognition  which 
utilizes  an  8-dlmenslonal  analogical  Image  representation.  This 
representation  decouples  common  dimensions  of  variability  In  the  Image 
formation  process  to  reveal  particular  4-D  canonical  patterns  that 
characterize  arbitrary  planar  gray-scale  forms  Invariant  to  Imaging 
geometry  and  scene  Illumination.  Canonical  patterns  that  are  embedded 
within  the  8-D  representation  can  be  detected  by  mathematically 
correlating  the  8-D  representation  with  corresponding  4-D  canonical 
patterns  stored  In  visual  memory.  Furthermore*  once  a  given  canonical 
pattern  has  been  Identified*  a  number  of  geometric  attributes  of  the 
corresponding  planar  objects  In  a  scene  are  specified  by  the  location  of 
the  suprathreshold  peak  in  the  corresponding  correlation  function. 

Work  Is  under  way  to  Investigate  the  performance  and  computational 
feasibility  of  the  methods  described  In  this  paper.  In  particular*  we 
will  be  using  a  discretely  sampled  approximation  to  the  proposed  8-D 
representation.  Though  the  amount  of  computation  required  Is  substantial* 
combinatorial  explosion  Is  not  a  problem  since  each  dimension  of  the  8-D 
representation  need  only  be  encoded  at  a  small  sampling  of  discrete 
points.  This  follows  from  the  fact  that  five  of  the  eight  dimensions 
correspond  to  finite  angular  axes*  and  the  other  three  dimensions  are 
logarithmic  distance  axes.  Furthermore*  referring  to  the  computation  tree 
for  the  8-D  representation  (Figure  13)*  It  should  be  apparent  that 
computation  of  each  4-D  function  found  at  any  leaf  node  of  the  tree  can  be 
carried  out  Independent  of  all  other  such  nodes.  A  leaf  node 


representation  provides  WALS- Invariance  to  rotation  and  scaling  about  a 
single  point  within  a  single  object  plane.  Therefore*  a  reasonable 
strategy  would  be  to  seek  recognition  of  planar  forms  by  sequentially 
shifting  "attention"  from  one  leaf  node  to  the  next.  This  would  eliminate 
the  need  to  compute  the  entire  8-D  representation  In  parallel.  It  could 
also  lead  to  efficient  strategies  for  handling  Image  frame  sequences  by 
exploiting  the  context  of  earlier  frames  to  guide  an  attentlonal 
mechanl sm. 

IMPACT  ON  STATE-OF-THE-ART 

As  seen  In  the  previous  sections*  the  Hierarchical  Multlsensor  Image 
Understanding  program  has  resulted  In  the  development  of  various  concepts 
and  techniques  which  have  the  effect  of  extending  the  knowledge  and 
capabilities  of  Image  understanding  systems.  The  most  noteworthy  of  these 
new  concepts  and  techniques  are: 

o  Context  Independent  Scene  Segmentation.  This  Is  a  unique  concept 
for  segmenting  a  scene  In  such  a  way  that*  unlike  conventional 
approaches*  the  procedure  does  not  depend  on  specific  objects  on 
specific  object  or  scene  models.  This  context-free  scene 
segmentation  Is  of  high  significance  because  It  removes  the 
limitations  of  traditional  scene  segmentation. 

o  Dynamic  Spatio-Temporal  Knowledge  Rag res en tat Ion  and  Inference. 
This  aspect  of  the  research  Investigates  approaches  for 
representing  and  processing  knowledge  on  spatial  constraints  amonc 
different  scene  objects  and  their  temporal  changes.  Specific 
results  are  contained  In  the  Archival  Scene  Model  (ASM),  a  highly 
compact  data  structure.  ASM  represents  all  the  Important  scene 
Information  such  as  relationship  between  objects  In  terms  of 
position  and  velocity.  Image-motion  compensation,  relative  motion 
of  objects  with  respect  to  the  Image  window*  and  background 
desert  ptl  on. 

o  Hierarchical  Planning  for  Control  of  Information  Flow.  Automated 
planning  Is  the  specification  of  a  path  through  Internally 
represented  world  states  beginning  at  the  current  state  and  ending 


at  a  goal  world  state.  Honeywell's  automated  planning  consists  of 
obtaining  or  deducing  basic  Informational  steps,  allocation  of 
Information  processing  resources,  and  the  verification  of  the 
Internal  world  states  with  the  observed  real  world.  Honeywell's 
concept  of  an  automated  planner  Includes  provisions  to  deal  with 
potential  false  Information  (Inferred  and/or  acquired),  parallel 
goal  expansion  which  considers  resource  allocation  as  an 
additional  dimension  of  planning,  conflict  resolution  knowledge 
base,  and  repl annl ng  mechanl sms. 

However,  besides  the  development  of  various  new  concepts  and  techniques, 
the  program  effort  also  opened  many  new  doors  In  the  Image  understanding 
area  by  demonstrating  feasibility  and  potential  of  various  additional 
concepts.  These  new  areas  of  research  we  recommend  further  work  fall 
Into: 

1.  Spatio-temporal  evidence  accrual  concepts  and  belief  systems.  The 
objective  Is  to  develop  a  goal-directed  system  to  search  spatially 
(across  a  scene)  and  temporally  (through  a  sequence  of  scenes)  for 
Information  that  will  minimize  a  measure  of  entropy  In  outdoor 
scenes  (or  maximize  a  measure  of  scene  explanation). 

2.  Modeling  and  representation  of  scene  Invariant  multi  sensor 
Information  via  Invariant  matrices  and  normalization  filters. 

This  representation  captures  the  essential  Information  about  the 
structural  and  perceptual  Invariants  In  the  scene,  regardless  of 
relative  size  and  orientation,  we  want  to  extend  our  work  In 
understanding  frcm  scene  synthesis  to  actually  Infer  causal  models 
that  explain  the  dynamic  and  purpose  of  scene  objects  via  Al-based 
Inference  engines. 

The  results  of  our  efforts  will  provide  a  stepping  stone  for  the  future 
development  of  mature  multi  sensor  vision  system  control  structures,  a 
framework  for  the  development  of  specific  multi  sensor  systems  and  an 
understanding  of  the  commonalities  within  and  between  sensor  domains. 
Furthermore,  a  functional  model  for  synergistic  use  of  complementary  and 
redundant  Information  In  all  levels  of  processes  of  a  multi  sensor  vision 
system  across  sensory  domains  will  provide  Investigators  with  a  tool  to 
experiment  and  evaluate  new  concepts  and  paradigms  In  computer  vision. 
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Figure  1.  Approach  to  a  Region  Discriminated  Image 
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A.  Original  FLIR  Image  of  Part  of  a  Power 
Plant 


B.  Results  of  Four  Different  Operators 
for  Combining  Primitive  Region 
Discrimination  Cues  into  a  Labeled 
Image 


C.  Discriminated  Regions  Superimposed  on 
the  Original  Image.  This  is  the  Result 
of  Conflict  Resolution  on  the  Four 
Labeled  Images  of  B 


Figure  2.  Context  Free  Region  Discrimination 
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FIGURE  7  INFER  BICE  DRIVEN  RECURSIVE  REFINEMENT 
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Figure  10.  Multilayer  hierarchies  within  decision  units  of 
a  multi  echel  on  systems. 
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Figure  11.  Decision 
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units  of  multilayer  hierarchy  presented  as  multilayer 
or  multlechelon  hierarchies.  (18) 


Figure  12. (a)  The  "fixation  point"  about  which  CL  mapping  is  performed  when 
deriving  thetemplate;  (b)  The  identified  form  whose  distance- 
normalized  position  in  its  object  plane  (o  •  t  ,t)  is  specified 
in  polar  coordinates  by  (e*°cosp  .  eXosinp°),  and  whose  distance- 
normalized  size  and  orientation, °relative  To  the  template  form,  i 
denoted  by  KQ  and  ©  ,  respectively. 
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TABLE  1  EXAMPLE  OF  EVIDENCE  ACCRUAL 


Picture  3.  Operator  1  on  full  image  of  power  plant  scene. 


Picture  4.  Operator  2  on  full  image  of  power  plant  scene 


Picture  5.  Region  from  Picture  4. 


Picture  6 


Operator  1  on  same  region  as  5. 


Picture  7.  Another  region  from  Picture  4. 


Picture  8.  Operator  1  on  same  region  as  7. 


Picture  9.  Operator  2  on  region  in  Picture  7. 


Picture  10.  Operator  2  on  region  in  Picture  7  with 
higher  edge  threshold. 


Picture  11.  Original  FLIR  image  of  highway  scene. 


Picture  12.  Highway  scene  after  noise  cleaning. 


Picture  14.  Operator  2  on  full  image  of  highway  scene. 


Picture  15.  A  region  from  Picture  13. 


Picture  16.  Operator  1  on  same  region  as  15. 


Picture  17.  A  region  from  Picture  14. 


Picture  18.  Operator  1  on  same  region  as  17. 


Picture  19.  Another  region  from  Picture  14. 


Picture  20.  Operator  1  on  same  region  as  19. 


Picture  21.  Another  region  from  Picture  14. 


Picture  22.  Operator  2  on  same  region  as  21. 


Picture  23.  Operator  1  on  region  from  Picture  #21. 


Picture  24.  Region  from  same  segmentation  as  23. 
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Picture  25.  Operator  1  on  full  image  of  highway  scene 


Picture  26.  A  region  from  Picture  25. 


Picture  29.  Outdoor  Scene  with  Trucks  and  Tanks:  Frame  3 


Picture  30.  Outdoor  Scene  with  Trucks  and  Tanks:  Frame  4. 
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Picture  31. 


Outdoor  Scene  with  Trucks  and  Tanks: 


Picture  32.  Outdoor  Scene  with  Trucks  and  Tanks: 


Frame  5. 


Frame  6 


